Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
ethanpailes
left a comment
There was a problem hiding this comment.
This looks pretty good to me. The main merge blockers are the lack of safety comments and the fact that I still have to patch this in on my macbook and run the tests.
We should also consider adding support for running the tests under macos in CI. I'm not sure how hard that will be to set up. I think it would be fair to punt on doing that until a followup.
|
I just patched this branch in on my macbook and ran the tests, including the single-threading flag. Unfortunately I saw a bunch of errors. @c22 is this expected? I would be ok with merging even with these errors since it's an improvement over the status quo, but we'll need to add something to the README explaining that while shpool builds on macos it's not yet an officially supported platform since the test suite doesn't pass yet. |
Ah yeah sorry, I should have been more explicit about that. A bunch of integration tests failed for me as well and bounce is still probably a bit flaky on macOS even in single threaded testing. I will diff against my side later today and see if anything stands out between your results. |
Got it. Where you thinking of trying to fix them all before merge or do you want to just get something merged and then fix the remaining tests on mac in followups? Like I said, since all the linux tests are passing I'm happy to merge even with some tests broken on mac. |
|
I checked my end and the exact same tests were failing on my machine as well, so yes, that was expected.
I was thinking of leaving them failed for now as they were also failing/skipped on seruman's builds. With some quick-fixes and workarounds, I was able to get 8 more tests to pass (2 are related to hardcoded fish/zsh paths, 6 are related to aggressive timing, eg. 500ms on Linux -> 2500ms on macOS). 4 more tests are MOTD pager related, I didn't look into them too deeply. 5 of the tests are re-attachment failures, these are a bit more concerning to me, because re-attachment is probably a key part of my use-case on Mac. Fixing that locally required changing the way I'm happy for you guys to take this as-is if you like (with the unsafe comments added), but if you're receptive to another fix, I can incorporate it as part of this change as well, I just probably won't get to it until after the weekend. |
|
One thing we could do is put The motd failures are likely because the motd crate doesn't work on mac. I certainly didn't do anything to make it work nicely on mac. Does mac even have a motd? We might want to just disable that feature for macs. It's only really useful in institutional settings where the motd has useful information. It's pretty useful within google, but I doubt I would have added the feature if it were not for google's active use of motd. I rather doubt people care about that feature too much in most other places.
Waiting a few days for at least the simple fixes seems worthwhile. Why don't you do those fixes, but don't break your back. If you can get the re-attachment fixes in those seem like they would be pretty nice to have. Just disable the motd and anything else that is too hard. Then we can follow up with some CI to keep things up to date and add a section to the README explaining the state of macos support. |
possibly the same in BSD
basically the same according to manual
File system watchers (inotify, FSEvents) report canonical paths, but ConfigWatcher was storing paths as provided by the caller. When a path contained symlinks (e.g., /var -> /private/var on macOS), event paths wouldn't match stored paths, causing config changes to be missed. Add canonicalize_path() to resolve symlinks in the existing portion of a path (handling paths where the final components don't exist yet), and apply it when storing paths in the watcher. Include a regression test that creates a symlink and verifies events are received when watching through the symlink.
The worker_ready() test helper signals when the worker is idle, but it was signaling even when there was a pending reload_deadline. This could cause tests to proceed before the debounce timeout fired, leading to flaky test results. Only signal idle when there's no pending work (reload_deadline is None). Update the debounce test to not call worker_ready() between writes, as that now correctly waits for the reload to complete.
This version includes macOS support via ptsname (vs ptsname_r). See: shell-pool/shpool_pty#4
- Forward test_hooks feature from shpool to libshpool crate - Shorten socket paths to avoid macOS 104-byte sun_path limit - Forward HOME and PATH env vars to attach subprocess (macOS typically lacks XDG_RUNTIME_DIR, needs HOME instead) - Use std::env::var directly to avoid unused import warning - Document that tests should run with --test-threads=1 on macOS due to FSEvents timing behavior under concurrent load
|
That should be most of it.. I believe the pager/motd tests are not easily fixed without some changes to shpool_pty but I can't remember what my findings were on that, I should have written them down. Most of the other issues were around race conditions and fixing them for macOS should hopefully prove beneficial to Linux test stability as well (particularly when under load or on slower machines). I didn't feel like arbitrarily increasing the wait times for the other tests, because it would increase the test times on Linux and CI/CD environments. Re-working the test invocation in to something a bit more generic (event based?) is probably a bigger task. Either way, hopefully this proves to be at least somewhat usable. |
macOS returns EINVAL when setting socket timeout on a connection where the peer has already closed (documented in setsockopt(2)). This typically happens with daemon presence probe connections.
On macOS, calling shutdown() on a Unix socket after the peer has disconnected returns ENOTCONN (error 57). This was causing the shell->client thread to exit with an error, which then caused re-attachment to fail. Add a shutdown_socket() helper that ignores ENOTCONN errors since the socket is already closed in that case.
Add worker_ready() synchronization before the rename operation, matching the pattern used in other config_watcher tests. Without this, the filesystem event from the rename could occur before the watcher is fully ready to receive it, causing flaky test failures on macOS where FSEvents timing differs from Linux's inotify.
* Wait for daemon-wrote-s2c-chunk event before sending the large command This avoids a race where the command is sent before the shell is ready to receive input. * Silence macOS bash deprecation warning via BASH_SILENCE_DEPRECATION_WARNING env var The warning contains 'd' which caused read_until(b'd') to stop early before reaching "food" in the restore buffer.
Skip 6 tests that fail on macOS due to platform differences: - prompt_prefix_zsh, prompt_prefix_fish: hard-coded Linux shell paths - motd_pager, motd_debounced_pager_*, motd_env_test_pager_*: PTY pager output doesn't reach client on macOS Update HACKING.md to document skipped tests.
| [env] | ||
| PS1 = "prompt> " | ||
| TERM = "" | ||
| BASH_SILENCE_DEPRECATION_WARNING = "1" |
There was a problem hiding this comment.
Do you guys want a comment explaining this? It's explained in my commit message.
Alternatively we can stop on a character that is not in the deprecation warning...
There is probably a more general solution to this like skipping shell integration tests when they are not installed or on the right version or something.
There was a problem hiding this comment.
Yeah a comment directly in the config file would be good. I think the way you have it is better than carefully choosing test data to avoid the conflict because it is clearer what is going on. I doubt anyone would want to change the test data in the future, but if they ever did a carefully choosen test string would be fragile in a non-obvious way.
|
Also regarding the shell integration tests that get skipped (zsh and fish), they are only really a problem on macOS because the paths are hardcoded. This would also be a problem on other *nix environments where the shells aren't where the tests expect them to be (the tests pass on macOS if you We could possibly use something like https://crates.io/crates/which to find the shell paths dynamically, but it introduces a dev dependency. Perhaps something for the future. |
Yeah this definitely seems like something that should be solveable with a little programming. I'm happy to punt on it for now though. |
| [env] | ||
| PS1 = "prompt> " | ||
| TERM = "" | ||
| BASH_SILENCE_DEPRECATION_WARNING = "1" |
There was a problem hiding this comment.
Yeah a comment directly in the config file would be good. I think the way you have it is better than carefully choosing test data to avoid the conflict because it is clearer what is going on. I doubt anyone would want to change the test data in the future, but if they ever did a carefully choosen test string would be fragile in a non-obvious way.
This work is based on the previous work by @seruman in their fork https://github.com/seruman/shpool/tree/macos
I tried to retain as much of the existing work done and base my work on top of that.
seruman's fork had socket timeouts disabled on macOS. I wasn't able to reproduce socket timeouts not working on macOS but I was seeing unhandled EINVAL errors on daemon presence probe connections (where the peer closes immediately). These are now handled gracefully, so I have dropped the changes that disabled socket timeouts.
I also ran into a bug where ConfigWatcher was not using canonicalized paths, but inotify et. al. use canonicalized paths. This likely wasn't being ran into much on Linux environments but becomes very obvious on macOS because /tmp is a symlink.
Timing on macOS is indeed tricky, the fix would involve probably significantly re-working the way test are being done, but the workaround I found was to run them in serial.
I think the rest of the changes were relatively minor.
Here's a summary of the changes:
and the cherry-picked commits from https://github.com/seruman/shpool/tree/macos